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Steganographic Codes — a New Problem of 
Coding Theory 

Weiming Zhang, and Shiqu Li 

Abstract — To study how to design steganographic algorithm more 
efficiently, a new coding problem - steganographic codes (abbreviated 
stego-codes) - is presented in this paper. The stego-codes are defined 
over the field with q(q > 2) elements. Firstly a method of constructing 
linear stego-codes is proposed by using the direct sum of vector subspaces. 
And then the problem of linear stego-codes is converted to an algebraic 
problem by introducing the concept of tth dimension of vector space. And 
some bounds on the length of stego-codes are obtained, from which the 
maximum length embeddable (MLE) code is brought up. It is shown that 
there is a corresponding relation between MLE codes and perfect error- 
correcting codes. Furthermore the classification of all MLE codes and a 
lower bound on the number of binary MLE codes are obtained based on 
the corresponding results on perfect codes. Finally hiding redundancy is 
defined to value the performance of stego-codes. 

Index Terms — steganography, stego-codes, error correcting codes, 
matrix encoding, MLE codes, perfect codes, hiding redundancy. 

I. Introduction 

Nowadays the security of communication means not only secrecy 
but also concealment, so steganography is becoming more and more 
popular in the network communication. Steganography is about how 
to send secret message covertly by embedding it into some innocuous 
cover-objects such as digital images, audios and videos. In this paper 
we take the image as example to describe our ideas. Usually the 
process of embedding message will make some changes to the cover- 
images. To reduce the possibility of detection, the sender hopes to 
embed as most bits of message as possible by changing the least 
number of bits of images. This task can be accomplished through 
some encoding technique that is firstly brought up by Crandall [1] 
who call it matrix encoding. And in the present paper we generalize 
the idea of Crandall and formally define this kind of codes as 
"steganographic codes" (abbreviated stego-codes). 

Besides increasing the embedding efficiency, stego-codes can also 
enhance the security of steganography at other aspects. Now some 
detecting methods on steganography can not only detect the existence 
of the hidden message but also very accurately estimate its length [2], 
[3]. And there is even methods which can search for the stego-key [4]. 
However, if there are a great many stego-codes that can be selected 
by the encoders as a part of the key, it will be very hard for the 
attacker to estimate the message length or recovery the stego-key. In 
fact, Fridrich [4] ever pointed out that matrix encoding is an effective 
measure against key search. 

LSB (Least Significant Bit) steganography is the most popular 
image steganographic technique, by simple LSB steganography the 
encoder selects a pixel (or DCT coefficient) every time and embeds 
one bit of message in its LSB by modifying methods such as replacing 
or ±1. This traditional technique can be viewed as coding two bits 
of message per changed pixel because in random case 50% pixels 
needn't to be changed. A better method is described in the CPT 
scheme [5], [6], which is a steganographic algorithm on binary image 
and can conceal as many as k bits of data in a host image of size 
2 fc — 1 by changing at most 2 bits. Another more effective example 
of stego-code is F5 [7], a LSB algorithm on IPEG image, which 
firstly implements Crandall's matrix encoding and can embed k bits 
of message in 2 k — 1 DCT coefficients by changing at most one of 
them. 
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To construct more effective stego-codes and study their properties, 
in the present paper we define linear stego-codes over finite field with 
> 2) elements by using multi-outputs logic functions. Firstly, as 
an example, a constructive method of linear stego-codes is proposed, 
which can generate the codes of F5 in a special case and is more agile 
than the codes of CPT. To study bounds on the length of linear stego- 
codes, we introduce the definition of tth dimension of vector space 
that converts the problems of linear stego-codes to pure algebraic 
problems. And then a bound on the length of linear stego-codes is 
obtained, from which we bring out the maximum length embeddable 
(abbreviated MLE) codes. Furthermore, it is shown that there is a 
1-1 correspondence between linear MLE codes and linear perfect 
error-correcting codes. 

To study the nonlinear stego-code, another direct definition for 
stego-codes is presented, based on which we explain the relations 
and differences between stego-codes and error-correcting codes in 
geometrical language and generalize linear MLE codes to nonlinear 
case. We prove the relations between MLE codes and perfect codes 
with two constructive proofs which can be used to construct MLE 
codes by perfect codes or construct perfect codes by MLE codes. 
Furthermore from the well-known results on perfect codes, the 
classification of all MLE codes and a lower bound on the number of 
binary MLE codes are obtained. 

Usually a steganographic algorithm can be valued by both message 
rate and change density. Large message rate and small change density 
means a good algorithm. To evaluate the performance of stego-codes 
more accurately, we introduce the concept of hiding redundancy that 
can be viewed as a combination of message rate and change density. 
Furthermore based on the result on hiding redundancy, another bound 
on the length of binary stego-codes is obtained. 

The rest of the paper is organized as follows. The construction and 
properties of linear stego-codes are analyzed in Sect. II. Nonlinear 
stego-codes and the relations between the MLE codes and perfect 
codes are studied in Sect. III. In Sect. IV a measure - hiding 
redundancy - is proposed to value the efficiency of stego-codes. And 
the paper concludes with a discussion in Sect. V. 

II. Linear Stego-codes 

A. Definitions 

To deal with the concepts that are introduced we adopt some 
notational conventions that are commonly used. The finite field with 
q elements is denoted by GF(q). The vector is denoted by bolt italic 
letter (e.g. x). The set is denoted by script letters (e.g. S). And denote 
the Hamming weight of a vector x £ GF n (q) as Wt(jr). 

For simpleness, we take LSB steganography on images as examples 
to describe the definitions and applications of stego-codes. 

Definition 1: An (n, k, t) stego-coding function over finite field 
GF(q) is a vectorial function H(x) = (hi(x),h,2(x), ■ ■ ■ ,hk(x)) : 
GF n (q) — » GF k (q) satisfying the following condition: For any 
given x £ GF n (q) and y £ GF k {q), there exists a z £ GF n (q) 
such that Wt(z) < t and H(x + z) = y. And H(x) is called linear 
stego-coding function if every component function hi(x) (1 < i < n) 
is a linear function. 

Definition 2: Let H(x) is an (n,k,t) stego-coding function over 
GF n (q). And for y £ GF k (q), let = {x : H(x) = y}. 

Then call 

S = {H-'iy) : y £ GF k (q) and//" 1 (y) / <j>} 

an (n, k, t) stego-code. 

Stego-coding function in principle is the decoding function, and to 
hide message with it, one also need an encoding algorithm. Generally, 
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encoding algorithm can be implemented through an encoding table B. 
For an (n,k,t) stego-coding function H(x) over GF(q), encoding 
table B is a q n x q k matrix, the index of its row is represented by 
x G GF n (q), and the index of a column by y G GF k (q). In the 
position (x,y), save the vector z G GF n (q) such that Wt(z) < 
t and H(x + z) = y. If H (x) is a linear stego-coding function, 
because H(x +z) = H(x) + H(z), one only need construct a 1 x q k 
encoding table, and denote the index of a column with y G GF k (q). 
In position y, save the vector* G GF n (q) such that Wt(x) < t and 
H(x) — y. Therefore for linear stego-codes generally there exists 
simpler encoding algorithm. Crandall points out that the design of fast 
encoding algorithm are also an open research area [1]. The following 
example shows a wonderful encoding method. 

Example 1 (F5-Matrix Coding): F5 [7] is a LSB steganographic 
program that embeds binary message sequences into the LSBs of 
DCT coefficients of JPEG images. F5 can embed k bits of message in 
2 k — 1 DCT coefficients by changing at most one of them. The inputs 
are code word (LSBs of DCT coefficients) x G GF 2k ' 1 (2) and the 
block of message y G GF k {2). The coding function is defined as 

2 k -l 

f(x) = © xi-i , (1) 

where, to do ©, the integer Xi-i is interpreted as a binary vector. And 
the encoding procedure is as follows: Compute the bit place that has 
to be changed as s = y © f(x) where the resulting binary vector s 
is interpreted as an integer. And then output the changed code word 

, { x if s = 

x = < 

\ (xi,x 2 , ■ ■ ■ ,x B © 1, • • • ,a;2fe+i) if s / 

which satisfies y — f(x'). 

According to Definition Q in fact is a (2 k — 1, k, 1) linear 
stego-coding function over GF(2). For instance, when k = 2, 
is equivalent to the vectorial function H(x) = (hi(x), fc(*)) where 
(hi(x) = X2 ffi X3, h2(x) = xi © X3). And the corresponding stego- 
code is 

S = { {(000), (111)}, {(Oil), (100)}, 

{(010), (101)}, {(001), (110)} } . 

CPT scheme [5], [6] is an example of nonlinear (2 k — l,k,2) 
stego-coding function. We firstly study linear stego-coding function 
which has the following necessary and sufficient condition. 

Theorem h Linear vectorial function H(x) over GF(q) is an 
(n, k, t) stego-coding function if and only if for any given y £ 
GF k (q), there exists a z G GF n (q) such that Wt(z) < t and 
H(z)=y. 

Proof: If H(x) is a linear stego-coding function over GF(q), 
DefinitionQJimplies that for any given y G GF k (q) and G GF k (q), 
there exists az G GF n (q) such that Wt(z) < t and^ = H(0 + z) = 
H(z). 

Conversely, for any given x G GF"(q) and y G GF k (q) there 
exists a z G GF n {q) such that Wt(z) < t and H(z) = y- H(x), 
i.e. H(x + z) =y because H(x) is a linear function. Therefore H(x) 
satisfies the condition of Definition Q ■ 

An (n,k,t) linear vectorial function H(x) = 
(hi(x),h 2 (x),- ■ ■ ,h k (x)) over GF(q), where hi(x) = 
anxi + a,i2X2 + ■ ■ ■ + ainXn (1 < i < k) can be represented by a 
k x n matrix over GF(q) such as 





an 


ai2 ■ 




H = 


a.21 


a.22 ■ 


0,2n 










a,ki 


a,k2 ■ 





We call H an (n, k, i) stego-coding matrix. There is a 1-1 correspon- 
dence between stego-coding functions and stego-coding matrices. 
And from Theorem^ we can define the stego-coding matrix directly 
as follows. 

Definition 3: A kxn matrix H over GF(q) is called stego-coding 
matrix if for any given y G GF k (q), there exists an x G GF n (q) 
such that Wt(x) < t and Hx tT = y. 

If H is an (n, k, t) stego-coding matrix over GF(q), then for any 
y G GF k (q), equation Hx tT — y has solutions, which implies that 
the rank of H is k. From Definition [3] we can get the following 
important property that is useful for the construction of linear stego- 
coding functions. 

Theorem 2: A k x n matrix H over GF(q) is an (n, k, t) stego- 
coding matrix if and only if, for any y G GF k (q), y tT must be a 
linear combination of some t columns of H. 

B. A Constructing Method of Linear Stego-coding Functions 

Theorem [2] suggests that we can construct stego-coding matrix 
through the direct sum of vector subspaces. To do that, we need 
the following lemma. 

Lemma 3: If V is a fc-dimensional vector space over GF(q) then 
there exists qk ~^ vectors x\,- - ,x q k_ 1 satisfying the following 

"FT 

properties: 

1) Any two of the q *~* vectors are linear independence. 

2) For any given y G V, there exist a G GF(q) and Xi, such that 
1 < i < i—^l and y = axi . 

Proof: Take any nonzero vector xi G V, and denote the 1- 
dimensional subspace spanned by x\ as Vi; then take any nonzero 
vector X2 G V\Vi and denote the 1-dimensional subspace spanned by 
X2 as V%; and then take any nonzero vector x$ G V\(Vi U V2) 
Do as such and finally we can get q *~^ 1-dimensional subspaces 
Vi, ■ ■ ■ , V q k_ 1 because the number of nonzero vectors in V is q k — 1 

and every f-dimensional subspace consist of q — 1 nonzero vectors 
and the zero vector. Assume that subspace Vi is spanned by Xi (1 < 
i < q Z\ )> The procedure of constructing these subspaces implies that 
any two of these Xi's are linear independence and V = Vi U V2 U 
■ ■ ■ U V k_ 1 . Therefore for any given y G V, there is Vi satisfying 

"FT 

y G V,, which means there exists a G GF(q) such that y = axt. ■ 

Based on Lemma [5] we can get the following constructive algo- 
rithm of (X^i=i ' ' q-i > ^1 ft) stego-coding matrix over GF(q). 

Algorithm 1: The procedure of construction goes through the 
following three steps. 

51 Take a basis of fc-dimensional vector space GF k (q) over 
GF(q) such as {xi,X2, ■ ■ ■ ,x k }. 

52 Divide {x\,X2, ••• ,Xk} into t disjoint subsets Bi (1 < i < 
t) such that Bi consists of ki vectors and X^=i ^ ~ 
Denote the ki -dimensional subspace spanned by Bi as Vi, 
1 < i < t. 

53 As doing in the proof of Lemma [3] take nonzero 
vectors from every subspace Vi (1 < i < t). And we can 
get E*=i q qJ~ 1 1 nonzero vectors in all. Then construct a 
k xrti q ' T 1 matrix H with all of these nonzero vectors 

^ l — 1 q — l 

as columns. And H is just a (e*=i q Li 1 > k, t) stego-coding 
matrix over GF(q). 

In fact by Lemma [3] for any subspace Vi and any vector x G 
Vi in Algorithm \\\ there exists a column of H which can linearly 
express x tT . On the other hand, GF k (q) is the direct sum of these 
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t subspaces Vj's. Combine these two facts, it can be proved that, 
for any y £ GF k (q), y tT is the linear combination of t columns of 

H. Therefore by Theorem[2] H is a (£* =1 <i *'_^ 1 , k, t) stego-coding 
matrix over GF(q). 

Let q — 2 and t = 1, with Algorithm Q we can construct (2 k — 

I, k, 1) linear stego-coding functions over GF(2) which are just the 
functions used in F5 (Example 0. 

C. The tth Dimension of Vector Space - Bounds on the length of 
Linear Stego-codes 

To study bounds on the length of stego-codes, we generalize the 
concept of vector space's dimension to define the tth dimension. 

Definition 4: If V is a vector space over field F, 
x,Xi,X2, ■ ■ ■ ,x n £ V and there are 0,1,0,2,- ■■ ,a n £ F such 
that Wt ((ai, a2, ■ ■ ■ , a n )) < t and x — 2~^™=i °**») we sa Y that x 
can be expressed as tth linear combination of Xi's; If for any x € V, 
x can be expressed as tth linear combination of Xi's, we say that 
{xi,X2, ■ • ■ ,x n } is a set of tth generators of V. 

Definition 5: Let V is a vector space over field F and 
{xi,X2, ■ ■ ■ ,x n } is a set of tth generators of V. If any another set 
of tth generators {y 1 ,y 2 , • ■ • i^ m } must satisfy that m > n, we call 
{xi,X2, ■ ■ ■ ,x n } a minimum set of tth generators of V and call n 
the tth dimension of V. 

In the terms of tth dimension, Theorem [2] can be stated in the 
following forms. 

Theorem 4: A k x n matrix H is an (n, k, t) stego-coding matrix 
over GF(q) if and only if the set consisting of n vectors correspond- 
ing to the n columns of H is a set of tth generators of GF k (q). 

Because a set of tth generators must be a set of (t+l)th generators, 
it is clear that for vector space GF (q) and t such that t > k, the 
tth dimension is k, and every basis of GF (q) is just a minimum set 
of tth generators of GF (q). In fact the tth dimension of GF k (q) 
such that t > k is insignificant for the problem of stego-codes. 

The following theorem is easy to be get but is important, because 
it converts the problem of linear stgeo-codes to a pure algebraic 
problem. 

Theorem 5: If the tth dimension of vector space GF k (q) over 
GF(q) is n, then for any integer m > n there exist (m, k, t) linear 
stego-codes. 

From Theorem [5] we know that the key problems of linear stego- 
codes are just how to estimate the tth dimension of GF k (q) and how 
to construct the minimum set of tth generators of GF k (q). Generally, 
it is hard to get the exact tth dimension of GF k (q), but we can obtain 
some bounds on it, which is also the bounds on the length of linear 
stego-codes. 

Theorem 6: If the tth dimension of vector space GF k (q) over 
GF(q) is n, then 



, fc <i+(.-D( ? ;)+(,-i) 2 (;)+---+(,-ir;" 



(2) 



Proof: Assume that {xi,X2, ■ ■ ■ ,x n } is a set of tth generators of 
GF k (q). Then for any x E GF k (q), x can be expressed as tth linear 
combination of {xi,X2, ■ ■ ■ ,x n }- On the other hand, there are in total 
l + (q-l)(1) + (q-l) 2 (? 2 )+- ■ - + (g-l)'(") tth linear combinations 
of {xi,X2, ■ ■ ■ ,x„} and q k vectors in GF k (q). Therefore, we get the 
inequality J2j. ■ 

As mentioned above the fcth dimension of vector space GF k (q) 
over GF(q) is k, so when t — k the equality holds in And the 



following corollary shows that the equality also holds in J2j with 

t = 1. 

Corollary 7: The first dimension of vector space GF k (q) over 
GF(q) is t k ~^ , and any set consisting of * k ~i vectors such that 
any two of them are linear independence is a minimum set of the 
first dimension generators. 

Proof: For any given x\, ■ ■ ■ ,x qk _ 1 G GF k (q) such that any 

q-l _ 

two of them are linear independence, the proof of Lemma |3] means 
that {*!,••• ,x qk _ 1 } is a set of the first generators of GF k (q). 

Because when n = 1 ~^ and t = 1, the equality in J2j holds, 
1*1, • • • ,x q k_ 1 } is a minimum set of the first generators. Therefore 

the first dimension of vector space GF k (q) is ' ~ 1 . ■ 

By Lemma [3] Corollary [7] and Theorem |4| for any q > 2 and 

k > 1, the ( q q Si ; k, 1) linear stego-codes over GF{q) exist, and 
when q = 2, we get the codes of F5 once more. 

By Theorem |4| and |6| an (n,k,t) linear stego-code over GF(q) 
must satisfy Q, which provides a upper bound on the embedded 
message length. Therefore when equality holding in 0, we get an 
important type of codes. 

Definition 6: An (n, k, t) linear stego-code over GF(q) is called 
maximum length embeddable (abbreviated MLE) if equality holds in 

m 

Note that the form of the bound in Theorem |6| is similar with that 
of Hamming Bound on error-correcting codes. 

Lemma 8 (Hamming Bound): A t-error-correcting (n,k) linear 
code over GF(q) must satisfy that 



r s >i+H)m+H) 2 ;+-'-+H) f r 



(3) 



Error-correcting codes are called perfect codes when equality holds 
in 0. The Crandall's examples [1], which are obtained from perfect 
codes, are just linear MLE codes. The following theorem will show 
the relations between linear MLE codes and linear perfect codes. 

Theorem 9: An (n-t)xn matrix H is the parity check matrix of 
a t-error-correcting perfect (n, k) code over GF(q) if and only if H 
is a stego-coding matrix of an (n, n — k,t) MLE code over GF(q). 

Proof: If H is the parity check matrix of a t-error-correcting 
code, any two tth linear combinations of the n columns of H are 
different. And because H is the parity check matrix of perfect code 
over GF(q), the number of all tth linear combinations of the H's 
columns satisfies that 



l + (« 



•HI i I +(*-D , 2 



(4) 



That means that the set consisting of vectors corresponding to n 
columns of H is a set of tth generators of GF n ~ k (q). And by 
Theorem |4| H is an (n,n — k, t) stego-coding matrix. Furthermore, 
J4j implies that H is a stego-coding matrix of an MLE code over 
GF{q). 

Conversely, assume H is a (n, n — k, i) stego-coding matrix of 
an MLE code over GF(q). As mentioned in Subsect. 11(A) the rank 
of H is n — k, which implies H is a parity check matrix of an 
(n, k) linear error-correcting code. And by Theorem |4| the set of 
vectors corresponding to n columns of H is a set of tth generators 
of GF n ~ k (q), which, with the fact that holds by Definition [6] 
implies that any two tth linear combinations of the n columns of H 
are different. Therefore the linear code with H as parity check matrix 
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can correct t errors. Once more by the fact that J4j holds, H is the 
parity check matrix of a perfect code over GF(q). ■ 

Example 2: Hamming codes are linear single-error-correcting 
codes. With the easy decoding method for Hamming codes, we 
can get easy encoding method for corresponding stego-codes. For 



instance, when q = 2 and k 
(7,4) Hamming code is 



3, the parity check matrix of binary 



H 



which is just a (7,3,1) stego-coding matrix and can hides 3 bits 
message in a codeword of length of 7 bits by changing at most 1 bit. 
Here we have taken the columns in the natural order of increasing 
binary numbers. For instance, when the inputs are codeword x — 
(1, 0, 0, 1, 0, 0, 0) and message y — (1,1, 0), compute 



Hx l 



Note that the result is the binary representation of 3 and also is just 
the third column of H. Then change the third position of x to output 

x 1 = (1, 0, 1, 1, 0, 0, 0) that satisfies 



1 " 




1 " 




i " 




" " 










e 


i 




1 


1 




1 









1 



Hx n 



In fact we can obtain another bound on the dimension of vector 
space GF k (q) by Algorithm □ 

Theorem 10: If the tth dimension of vector space GF k {q) over 
GF(q) is n, then 

n ( g L*J-l)( t -l)+ g *-L*K'-i>-l 
9-1 

Because {5J is an upper bound on the tth dimension of vector space 
GF k (q), Theorem [5] implies that for any positive integer n such that 



n > 



(q^i _i)(t-i) + gfc-L*J(*-i)_i 
q-1 



(n, k, t) linear stego-codes over GF(q) exist. 

III. Nonlinear Stego-codes 

A. Definitions 

The Definition[2]for stego-codes is based on stego-coding function. 
In fact we can define stego-codes directly as follows, which is useful 
for us to study nonlinear stego-codes. 

The Hamming distance between two vectors x and y C GF n (q) 
is denoted by Dist(jt,y). 

Definition 7: By an AT -partition of GF n (q), we mean a set 
{7o, Ii, • • • Im-i} satisfying the following two conditions: 

1) To, Ji, ■ • • Im-i are non-empty subsets of GF n (q) and any two 
of the M subsets are disjoint; 

2) GF n (q)=I Ul 1 U---UI M -i. 

Definition 8: If J is a nonempty subset of GF n (q) and x G 
GF n (q), define the distance between x and J as Dist(x,J) = 
miriDist(jc, y). 

Definition 9: An (n, AT, t) stego-code over GF(q) is a set S — 
{To, Ii, • • • , Im-i} satisfying the following two conditions: 
1) {Io,h, ■ ■ ■ hi-i} is an AT-partition of GF n (q). 



2) for any x G GF n (q) and any i such that < i < M - 1, 
Dist(x, It) < t. 

For an (n, M, t) stego-code S = {Jo, Ii, • ■ • Im-i} over GF(q), 
a corresponding stego-coding function can be constructed as follows. 
Let m = |~log q M] , and the AT message symbols can be expressed by 
AT vectors in GF m (q), for example, y , - ' ' ^m-i- Define function 
H : GF"(q) -> GF m (q) such that, H(x) = y p if x G F, where 
< i < M — 1. Then with H as decoding function, Definition 
[5] implies that for any given message y € GF m (q) and codeword 
x G GF n (q), y can be hidden into x (i.e. expressed by H(x )) by 
changing at most t elements of x. Herein H is a vectorial function. 
And if every component function of H is a linear function, we call 
H a linear stego-coding function and call the corresponding code 
S = {Iq, Ji, • • • Im-i} a linear stego-code. For the linear stego- 
coding function H, if the rank of its coefficients matrix is k, then 
\Io\ = |/i| = ••• = |/a/-i| = q n ~ k , which means that M = q k . 
Therefore the linear stego-code can be simply denoted by (n,k,t) 
as we use in Sect. II. 

We say that two (n,M,t) stego-codes S = {/ U7iU- • - Uhi-i} 
and T = { Jo U Ji U • ■ ■ U Jm-i} over GF(q) are equivalent if there 
is a permutation n of the n coordinate positions and n permutations 
cti , ■ ■ ■ , o n of q elements such that for any i (0 < i < M — 1), there 
exists j (0 < j < M — 1) satisfying 7r(cri(a;i), ■ ■ ■ ,a n (x n )) G F 
if (xi, ■ ■ ■ ,x n ) G Jj. 

The conclusion in Subsection 11(C) implies that there is relations 
between the stego-codes and error-correcting codes. The general 
definition for error-correcting codes including linear and nonlinear 
codes is as follows. 

Definition 10: ' 8 ' An (n, M, d) error-correcting code over GF(q) 
is a set of M vectors of GF n (q) such that any two vectors differ in 
at least d places, and d is the smallest number with this property. 

To understand the relations and differences between the error- 
correcting codes and stego-codes, we think of these codes geomet- 
rically as MacWilliams did in [8]. The vector (01,02, ••• ,a n ) of 
length n gives the coordinates of a vertex of a unit cube in n 
dimensions. Then An (n, M, d) error-correcting code is just a subset 
of these vertices while an (n, M, t) stego-code is a partition of these 
vertices. 

In this geometrical language, the error-correcting coding theory 
problem is to choose as many as vertices of the cube as possible while 
keeping them a certain distance d apart. However, the stego-coding 
theory problem is to divide vertices of the cube as many disjoint non- 
empty subsets as possible while keeping any vertex closer to every 
subset. In fact, an (n, AT, i) stego-code make the sphere of radius t 
around any vertex intersects all these AT subsets. 

B. Maximum Length Embeddable (MLE) Codes 

With Definition [5] of stego-codes, we can generalize Theorem |S| 
and Definition |6| as following Theorem I 111 and Definition II II 



Theorem 11: An (n,M,t) stego-code over GF(q) must satisfy 
M < 1 + (q - 1 



+<,-.fm + . ■ + (,-i>' ; 



(6) 



Proof: Let S = {Io U/i U- • • U T M -i} be an (n, AJ, t) stego- 
code over GF(q). Then for any given x G GF n (q), the sphere of 
radius t around x must intersect every F (0 < i < AT — 1). Note that 
this sphere contains 1+ {q - 1) (™) + (q - l) 2 (™) + • ■ ■ + (q - 1)* (™) 
vectors and these AT subsets J^'s are disjoint, and then we get the 
inequality {SJ. ■ 
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Definition 11: (n,M,t) stego-code over GF(q) is called maxi- 
mum length embeddable (abbreviated MLE) if equality holds in |6j. 

MLE codes have following two interesting properties, and the first 
can be obtained from definitions of stego-codes and MLE codes 
directly. 

Lemma 12: If S = {I U h U • • • U Jju-i} is an MLE (n, M, t) 
stego-code over GF(q), then for any x G GF n (q), the sphere of 
radius t around x shares only one vector with every 7j (0 < i < 
M - 1). 

Lemma 13: For the MLE (n,M,t) codes over GF(q), there 
exists some integer k such that M = q k . 

Proof: Let «S = { I U Ji U • • • U Jm-i} be a (n, M, t) MLE 
stego-code over GF(q). Then for any subset ij (0 < i < M — 1) and 
any x £ Ii, Lemma l"i"2l implies that, in any Jj (0 < j < M — l,j ^ 
j), there is only one vector, for example denote it by y, satisfying 
Dist(jc,j) < t. Therefore the mapping / : Jj — » Ij such that f(x) = y 
if Dist(jc, y) < t is a 1-1 correspondence between Ji and Ij. So there 
exists integer A such that |Jo| = ■•• = |Jm-i| = A. Assume that 
the character of field GF(q) is p and q — p T , then 



Write these vectors by y , • ■ ■ ,y q k_ 1 and assume that y is the 
zero vector. Denote the sphere of radius t around Xi by Ot{xi), i.e. 



AM 



Therefore there exists some integer j such that A = p J , and 



A H ("J(ff-i)* = ff B =p B 



Thus q — 1 = p r — 1 divides p nr 3 — 1, which implies that r divides 
j and M is a power of q. ■ 

In Subsection 11(C) we have proved that there is a 1-1 correspon- 
dence between linear MLE codes and linear perfect error-correcting 
codes. Therefore we guess that there are also corresponding relations 
between nonlinear MLE codes and nonlinear perfect codes. 

Hamming bound for error-correcting codes (Lemma [8} and the 
definition of perfect codes has general forms as follows. A t-error- 
correcting code over GF(q) of length n containing M codewords 
must satisfy 



M 1 + (?-!) 



< Q 



(7) 



If equality holds in Q, the t-error-correcting code over GF(q) of 
length n containing M codewords is called perfect code. And it can 
be proved that the number of codewords of a perfect code M is a 
power of q [8]. 

The following two theorems show the relations between the MLE 
codes and perfect codes. And we provide two constructive proofs 
which can be used to construct MLE codes with perfect codes or 
construct perfect codes with MLE codes. 

Theorem 14: If p is a t-error-correcting (0 < t < n) perfect code 
over GF(q) of length n containing q n ~ k (0 < k < n) codewords, 
then there exists a (n, q k , t) MLE code S = {Io U Ji U • • • U I q k-i} 
over GF(q) such that p equals some F (0 < i < q k — 1). 

Proof: Let p = {xi,Xz, ■ • ■ ,x qn -k} be a t-error- 
correcting perfect code of length n containing q n ~ k codewords. 
Then the minimum distance of p must be larger than 2t and 
q n ~ k (l + (q- 1)0 + ••• + (q- 1)*($) = q n . Therefore the 
number of vectors whose weights are not larger than t satisfies 



Ot(xt) = {*i < j < q k - 1} (1 < i < q" 



These q 



spheres are disjoint because p is a t-error-correcting code. 

Now construct the stego-code S = {Io U Ii U • ■ • U Im-i} as 
follows. 



Ii = &i +Xj,l<j<q n < i < q k - 1. 



(9) 



We claim that {I U h U • • • U i,fc_i} is a partition of GF"(q). 
In fact, any two of the q k subsets are disjoint. Otherwise, if two 
subsets, e.g. Io and I\, are intersectant, then there exist i ^ j such 
that y + Xi — y 1 +Xj, which implies Ot{xi) D Ot(xj) / 0, and 
a contradiction to Ot(xi)'s being disjoint follows. Furthermore note 
that every F (0 < i < q k — 1) contains q n ~ k vectors. Therefore 
GF( 9 ) = /„U/iU--U/, M . 

Now to prove {Io U F U ■ ■ ■ U I q k _ 1 } being a stego-code, the only 
thing we should verify is that for any z 6 GF n (q), the sphere of 
radius t around z, i.e. O t (z) = {Zj : Zj = Z+yj and < j < q k — 1}, 
intersects every F (0 < i < q k — 1). Otherwise, there must exist 
some subset, e.g. F, that shares at least two vectors with Ot(z) 
because Ot{z) includes only q k vectors. For instance, if there are 
< ii < 12 < q k — 1 such thatz^ G F and Zi 2 G F, then there exist 
< ji < n < <f~ k sucn that z %1 = y h +x n and z l2 = y h +x J2 . 
Therefore, on one hand, Distfz^ , Zi 2 ) = Dist(z + y i± , z + y i2 ) = 
T)ist(y ii , y i2 ) < 2t, but on the other hand, Dist^^ ,Zi 2 ) — Dist(y h + 
Xj ly y h + Xj 2 ) = Dist^jj ,Xj 2 ) > 2t. And a contradiction follows. 
So we prove that {Io UIi U ■ • ■ U I q k_ 1 } is an (n, q k , i) stego-code, 
and it is a MLE code because J8j holds. Finally, {5J means Io = p, 
because ^ is the zero vector. ■ 

Theorem 15: If S = {/ U F U ■ ■ ■ U I qk _ 1 } is an (n, q k , t) MLE 
code over GF(q), then every F (0 < i < q k — 1) is a t-error- 
correcting perfect code over GF(q) of length n containing q n ~ k 
codewords. 

Proof: Let S = {I U 7i U • • • U Im-i} be an (n,q k ,t) 
MLE code over GF(q). The poof of Lemma IT*3l implies that every 
Ii (0 < i < q k — 1) contains q n ~ k vectors. Now we prove 
any F, e.g. Jo, is a t-error-correcting code. In fact, for any two 
vectors x±,X2 G Io, the sphere of radius t around them, i.e. 
Ot(xi) and Ot(x2), are disjoint. Otherwise, if there exists z G 
Ot(x\) PI Ot(x2), then the sphere of radius t around z shares two 
vectors with Io, which is contrary to Lemma l"i"2l Therefore Io is 
a t-error-correcting code of length n containing q n ~ k codewords. 
Furthermore, because <S = {Io U F U ■ ■ ■ U Im-i} is an MLE 
code, g"- fc (1 + (q _ 1) (»)+... + (g - 1)*(»)) = = g n , 

which implies that Jo is a perfect code. ■ 

Theorem El and 1151 show that there is a corresponding relation 
between perfect codes and MLE codes in equivalent sense. And in 
fact these two theorems imply that the classifications of MLE codes 
can be determined by the classifications of perfect codes. 

There are there kinds of trivial perfect codes: a code containing 
just one codeword, or the whole space, or a binary repetition code 
of odd length. We call the corresponding MLE codes also trivial 
MLE codes, i.e. (n,q n ,n) or (n, 1, 0) code over GF(q), or binary 
(2t + l,2 2t ,t) code, which can be constructed by Theorem 1 141 

The work of Tietcivaine [9] shows that there are only there kinds 
of parameters n, M and d for nontrivial perfect codes. 

1) The binary (23, 2 12 , 7) Golay code (linear three-error- 
correcting code) which is unique in the sense of equivalence. 

2) The ternary (11, 3 6 , 5) Golay code (linear two-error-correcting 
code) which is unique in the sense of equivalence. 
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3) The i- 1 r , 3 J code over GF(q) (single-error- 

correcting code). All linear perfect codes with these parameters 
are equivalent, i.e. the Hamming codes. And there exist non- 
linear perfect codes with these parameters over GF(q) for all 
<?• 

Correspondingly, Theorem l 14l and [T5l implv that there are also only 
there kinds of possible parameters n, M and t for MLE nontrivial 
codes. 

Corollary 16: An MLE codes must belong to one of the following 
three types: 

1) The binary linear (23, 2 11 , 3) code. All MLE codes with these 
parameters are equivalent. 

2) The ternary linear (11, 3 5 , 2) code. All MLE codes with these 
parameters are equivalent. 

3) The (^Er,q r ,lj code over GF(q). All linear MLE codes 
with these parameters are equivalent. And there exist nonlinear 
MLE codes with these parameters over GF(q) for all q. 

For the security of steganographic systems, we hope there are 
enough stego-codes, especially binary codes. And the following 
corollary shows that there are indeed so many binary MLE codes. In 
fact, Krotov [10] ever proved that there are at least 

22 2i+i- loS2 („ + 1) 2 2^-l= B 2("+l) 

different perfect binary codes of length n (n = 2 r — 1). Therefore, 
with Theorem 1 141 and 1131 we can obtain the following lower bound 
for length n binary MLE codes. 



Corollary 17: There are at least 



71-3 

3 2 4 



n + 5 
->2 4 



log 2 (n + l) 



n+ 1 

different MLE binary codes of length n, where n = 2 r — 1. 

So far there have been many designs for different nonlinear perfect 
binary codes with which and Theorem 1141 we can construct the 
corresponding MLE binary codes. 

IV. Hiding Redundancy - The Performance of 
Stego-codes 

Usually the performance of encoding method for steganography 
is valued by " message rate", "change density" or "embedding 
efficiency". For example, for the sequential LSB steganography on 
images, we say that the message rate is 100% (the LSB of every 
pixel carries one bit message), the change density is 50% (on average 
50% pixels needn't to be changed), and so the embedding efficiency 
is 2 (on average embed 2 bits per change). However these three 
measures can only reflect one aspect of this problem respectively. In 
fact, the user hopes to get the maximum message rate within a proper 
constraint of "change density", which is just the so called hiding 
capacity. Therefore the difference between the hiding capacity and 
message rate, which we call as "hiding redundancy" in this paper, 
can reflects the capability of a stego-code soundly. To introduce the 
concept of hiding redundancy, the following preparations are needed. 

We use the following notations. Random variables are denoted by 
capital letters (e.g. X), and their realizations by respective lower case 
letters (e.g. a;). The domains over that random variables are defined 
are denoted by script letters (e.g. X). Sequences of N random vari- 
ables are denoted with a superscript (e.g. X N — (Xi, X2, ■ ■ ■ , Xn) 
which takes its values on the product set X ), And we denote entropy 
and conditional entropy with H(-) and H(-\-) respectively. 



Assume that the cover-objects X are independent and identically 
distributed (i.i.d) samples from P(x). Because the embedded message 
M usually is cipher text, we assume that it is uniformly distributed, 
and independent of X . And M is hidden in X , in the control of 
a secret stego-key K, producing the stego-object X N . 

A formal definition of steganographic system (abbreviated 
stegosystem) is present by Moulin [11]. First of all, the embedding 
algorithm of a stegosystem should keep transparency that can be 
guaranteed by some distortion constraint. A distortion function is a 
nonnegative function d : X x X — > 7Z + U {0}, which can be extended 
to one on A^-tuples by d(x N ,y N ) = j^^2,^ =1 d{xi,yi). A length- 
stegosystem 1 subject to distortion D is a triple (M, /jv, 4>n), 
where M is the message set, /jv : X x M x /C —* X is 
the embedding algorithm subject to the distortion constraint D, and 
4>n ■ X x K. — > .M is the extracting algorithm. 

A cover channel is a conditional p.m. f. (probability mass function) 
q(x\x) : X — » X. Denote the set of cover channels subject to 
distortion D by Q. Furthermore, define the message rate as R m = 
SiMl and the probability of error as P eN = P{<j> N {X N , K) / M). 

The hiding capacity is the supremum of all achieve message rates 
of stegosystems subject to distortion D under the condition of zero 
probability of error (i.e. P e ,N — * as N — > 00). When disregarding 
the active attacker, the results of [11], [12] imply that the expression 
of hiding capacity for stegosystem can be given by 



C(D) = max H(X\X) 

q(x\x)£Q 



(10) 



Because C(D) is the maximum of the conditional entropy through 
all cover channels subject to D, C(D) just reflects the hiding ability 
of the cover-object within the distortion constraint. So we refer to 
C(D) — R m as the hiding redundancy of cover-objects, which can 
reflect the hiding capability of a stegosystem. We have assumed 
that the embedded message is uniformly distributed, and independent 
of X N , which means that there are uniformly distributed values at 
the positions to be changed. Then an (n, k, t) stego-coding function 
and a corresponding encoding algorithm can compose a stegosystem 
with message rate being And when using Hamming distance as 
distortion function, the average distortion is just the change density. 
However note that — is the maximum distortion. And the computation 
of average distortion relies on the encoding algorithm. For the 
linear (n, k, t) steg-code over GF(2) , as mentioned in Sect. II, its 
encoding algorithm can be formulated as a table consisting of 2 h 
n-dimension vectors. Let a,i, where < i < t , be the number of 
vectors of weight i in the table. Then the average distortion (change 
density) of this code is yj]' =1 fli;. For instance, the average 
distortion of (2 fc — 1, k, 1) stego-code in F5 (Example equals 

£[l-5^ + (2 fc -l)-^] = £. 

It is hard to compute the hiding capacity for general cover-objects. 
Now consider Bernoulli(i)-Hamming case: The set of symbols of 
cover-objects is X = {0, 1} , and the sequence of cover-objects 
X N satisfies distribution of Bernoulli(|); The distortion function is 



Hamming distance, i.e. d(x,y) 
case has been given by [12]. 



: x © y. The hiding capacity for this 



Lemma 18: ' 12 ' For Bemoulli(|)-Hamming case with distortion 
constraint D, the hiding capacity is 



C(D) 



H{D) 
1 



if < D < i 
ifZ?>i 



where H(D) = -Dlog 2 D - (1 - D) log 2 (l - D). 

'in [11] the terms of information hiding code is used here. To distinguish 
the problem of this paper and that of [11], we replace it by stegosystem. 
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Fig. 



1. The middle curve is obtained by connecting the points such as 
2fc fc - ). The difference between the two curves is the hiding redundancy 



of F5; and the difference between the curve of H(D) and the beeline of 2D 
is the hiding redundancy of simple LSB steganography. 



LSBs of images satisfies distribution of Bemoulli(i) approxima- 
tively. So we take LSB steganography as a criterion, i.e. apply stego- 
codes to LSB steganography, to compare the performance of different 
stego-codes. 

Example 3 (Hiding Redundancy of Stego-codes): For the simple 
LSB steganography, the message rate is 2D when distortion is D and 
< D < t;, therefore the hiding redundancy is H(D) — 2D. On 
the other hand, for the (2 k — 1, k, 1) stego-code in F5, the message 
rate is 2 k _ 1 - distortion is p-, and then the hiding redundancy is 
Fig. Q shows that F5 is better than simple LSB 



2'' 

2 fc 1 ~~ 2"~\ ' 



steganography, because the hiding redundancy of F5 is smaller. 

Furthermore, by Lemma 1181 we can get another bound on the 
length of binary stego-codes. 

Theorem 19: The (n, k, t) steg-code over GF(2) such that — < | 
must satisfy 

k -< H (t 

n \ n/ 

Proof: For any given (n, k, i) steg-code over GF(2), assume its 
average distortion (change density) is D. By the definition of capacity, 
the message rate £ is smaller than the hiding capacity C(D). And 
when ~ < §> we have H(D) < H(±) because D < £ (Note that 
— is the maximum distortion). Apply this code to the cover-object 
satisfying distribution of Bernoulli(i) and Lemma ITTfl implies that 
±<C(D) = H(D)<H(±). 

m 

Specially for linear binary stego-codes, combining Theorem|5|and 
1191 we can get the following interesting result which seems hard to 
be obtained from the point of view of algebra directly. 

Corollary 20: If the tth (1 < t < k) dimension of vector space 



GF k {2) over GF(2) is n and 



< 



then 



k -< H (i 

n \ n 



V. Conclusions 



In this paper, we formally define the stego-code that is a new 
coding problem, and studied the construction and properties of this 
kind of code. However there are still many interesting problems 
about this topic, such as the estimation of tth dimension and the 
construction of minimum set of tth generators of GF k (q), other 
bounds on the length of stego-codes, the construction of fast encoding 



algorithms, the construction of codes that can approach the hiding 
capacity, and the further relations between stego-codes and error- 
correcting codes. Further researches also include the applications of 
stego-codes in other possible fields. 
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